Skip to content

CNTRLPLANE-2549:test/e2e: migrate refresh-CA test for OTE compatibility#306

Open
wangke19 wants to merge 5 commits intoopenshift:mainfrom
wangke19:ote-migrate-refresh-ca
Open

CNTRLPLANE-2549:test/e2e: migrate refresh-CA test for OTE compatibility#306
wangke19 wants to merge 5 commits intoopenshift:mainfrom
wangke19:ote-migrate-refresh-ca

Conversation

@wangke19
Copy link
Contributor

@wangke19 wangke19 commented Jan 16, 2026

Summary

Split the OTE suite into stable and disruptive suites to fix CI failures caused by the refresh-CA test's cluster-wide TLS disruption.

Problem

The CI job e2e-aws-operator-serial-ote fails because the refresh-CA test deletes the cluster's service CA signing key (signing-key secret in openshift-service-ca), causing cluster-wide TLS disruption. The OTE framework runs openshift-tests monitors alongside the tests that detect this disruption and report failures. The current suite does not set ClusterStability, which defaults to Stable (zero disruption expected).

Solution

Split into two suites:

  • Stable suite (openshift/service-ca-operator/operator/serial): 13 non-disruptive tests with strict cluster health monitoring (ClusterStability: Stable)
  • Disruptive suite (openshift/service-ca-operator/operator/serial-disruptive): 1 test (refresh-CA) with relaxed monitor thresholds (ClusterStability: Disruptive)

Changes

1. Tag refresh-CA test as [Disruptive] in test/e2e/e2e.go

// Before:
g.It("[Operator][Serial] should regenerate serving certs and configmaps when CA is deleted and recreated", func() {

// After:
g.It("[Operator][Serial][Disruptive] should regenerate serving certs and configmaps when CA is deleted and recreated", func() {

2. Split into two suites in cmd/service-ca-operator-tests-ext/main.go

Replace the single suite with two:

// Non-disruptive tests run with strict cluster health monitoring.
extension.AddSuite(oteextension.Suite{
    Name:             "openshift/service-ca-operator/operator/serial",
    Parallelism:      1,
    ClusterStability: oteextension.ClusterStabilityStable,
    Qualifiers: []string{
        `name.contains("[Operator]") && name.contains("[Serial]") && !name.contains("[Disruptive]")`,
    },
})

// Disruptive tests (e.g. CA rotation) that cause expected cluster-wide TLS disruption.
// Monitors will relax thresholds for this suite.
extension.AddSuite(oteextension.Suite{
    Name:             "openshift/service-ca-operator/operator/serial-disruptive",
    Parallelism:      1,
    ClusterStability: oteextension.ClusterStabilityDisruptive,
    Qualifiers: []string{
        `name.contains("[Operator]") && name.contains("[Serial]") && name.contains("[Disruptive]")`,
    },
})

Files Modified

File Change
cmd/service-ca-operator-tests-ext/main.go Split single suite into stable + disruptive suites
test/e2e/e2e.go Add [Disruptive] tag to refresh-CA test name

Verification Results

Build

  • make build - Both service-ca-operator and service-ca-operator-tests-ext binaries compiled successfully

OTE Suite Verification

  • Suite registration - list suites confirmed two suites with correct ClusterStability settings
  • Test routing - list tests confirmed 14 total tests; refresh-CA tagged [Disruptive] routed to disruptive suite

Suite breakdown:

  • Stable suite (openshift/service-ca-operator/operator/serial): 13 non-disruptive tests, ClusterStability: Stable
  • Disruptive suite (openshift/service-ca-operator/operator/serial-disruptive): 1 test (refresh-CA), ClusterStability: Disruptive

Test Execution

Mode Tests Found Result Failure Reason
OTE stable suite 13 tests (non-disruptive) Cluster unavailable (Service Unavailable)
OTE disruptive suite 1 test (refresh-CA) Cluster unavailable (Service Unavailable)

Note: All test execution failures are due to the target cluster being unreachable (Service Unavailable). This is an infrastructure issue, not a code issue. The suite split and test routing are verified correct.

Related Work

CI Configuration

A companion PR has been created in openshift/release to add the new CI job for the disruptive suite:

Future Work

Future CA rotation test migrations (time-based-ca-rotation, forced-ca-rotation) should also use the [Disruptive] tag so they are automatically routed to the disruptive suite.

Commit

  • Branch: ote-migrate-refresh-ca
  • Commit: e60fe227d
  • Message: test/e2e: split OTE suite into stable and disruptive for refresh-CA
  • Status: Pushed to origin/ote-migrate-refresh-ca

Migrates the refresh-CA test to the OTE (Openshift Test Extended)
Ginkgo framework while maintaining dual-compatibility with traditional
Go tests.

Changes:
- Add Ginkgo test context in e2e.go for OTE test discovery:
  * refresh-CA - Tests CA regeneration when deleted and recreated
- Extract test logic into shared function with testing.TB interface:
  * testRefreshCA() - Verifies serving certs and configmaps update
    when CA secret is deleted and recreated
- Add helper functions in e2e.go:
  * pollForCABundleInjectionConfigMapWithReturn() - Polls for configmap
  * pollForCARecreation() - Polls for CA secret recreation
- Keep test runner in e2e_test.go that calls shared function
- Remove duplicate helper functions from e2e_test.go

Net change: +30 lines (132 added - 102 removed)

Both test frameworks continue to work:
- Standard Go test: go test -run "^TestE2E$/^refresh-CA$"
- OTE: Ginkgo test discovery via service-ca-operator-tests-ext
@coderabbitai
Copy link

coderabbitai bot commented Jan 16, 2026

Walkthrough

Adds a consolidated end-to-end CA refresh test (testRefreshCA) and polling helpers to test/e2e/e2e.go, replaces inline test steps in test/e2e/e2e_test.go, and updates test-suite selection logic in cmd/service-ca-operator-tests-ext/main.go to separate disruptive tests.

Changes

Cohort / File(s) Summary
E2E tests — new test and helpers
test/e2e/e2e.go
Introduces testRefreshCA and polling helpers (pollForCABundleInjectionConfigMapWithReturn, pollForCARecreation) that create test namespaces/services/certs, delete the CA signing-key secret, wait for CA recreation, and validate updated secrets and ConfigMap data.
E2E tests — test invocation and cleanup
test/e2e/e2e_test.go
Removes previously inlined CA-refresh polling helpers and test logic, replaces inline steps with a call to testRefreshCA in the "refresh-CA" subtest, and removes a duplicate headless-stateful-serving-cert-secret-delete-data test block.
Test runner / suites
cmd/service-ca-operator-tests-ext/main.go
Modifies test selection: excludes disruptive tests from the default serial suite and adds a separate serial-disruptive suite (ClusterStability = Disruptive) that targets tests tagged [Operator], [Serial], and [Disruptive].

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 16, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@test/e2e/e2e.go`:
- Around line 1230-1317: The test captures configmapCopy before verifying the
injected CA data and calls pollForSecretChangeGinkgo for the headless secret
with no keys (which returns immediately); fix by moving the configmapCopy =
configmap.DeepCopy() to after checkConfigMapCABundleInjectionData(adminClient,
testConfigMapName, ns.Name) so the baseline includes injected data, and call
pollForSecretChangeGinkgo(t, adminClient, headlessSecretCopy, v1.TLSCertKey,
v1.TLSPrivateKeyKey) (same keys used for the regular secret) so the headless
secret change is actually validated; locate symbols
pollForServiceServingSecretWithReturn, configmapCopy,
checkConfigMapCABundleInjectionData, pollForConfigMapChange, headlessSecretCopy,
and pollForSecretChangeGinkgo to make the edits.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 503e4f5 and 952a380.

📒 Files selected for processing (2)
  • test/e2e/e2e.go
  • test/e2e/e2e_test.go
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • test/e2e/e2e.go
  • test/e2e/e2e_test.go
🧬 Code graph analysis (1)
test/e2e/e2e.go (1)
pkg/controller/api/api.go (1)
  • InjectionDataKey (29-29)
🔇 Additional comments (3)
test/e2e/e2e_test.go (1)

697-703: Clean delegation to shared refresh-CA test.

Nice consolidation to the shared testRefreshCA path for OTE compatibility.

test/e2e/e2e.go (2)

106-111: refresh-CA Ginkgo context integration looks good.

Clear test discovery wiring for OTE.


1319-1349: Polling helpers are straightforward and consistent.

Timeout usage and error handling align with the other poll helpers.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@wangke19 wangke19 changed the title test/e2e: migrate refresh-CA test for OTE compatibility CNTRLPLANE-2549:test/e2e: migrate refresh-CA test for OTE compatibility Jan 16, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 16, 2026
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jan 16, 2026

@wangke19: This pull request references CNTRLPLANE-2549 which is a valid jira issue.

Details

In response to this:

Summary

Migrates the refresh-CA test to the OTE (OpenShift Test Extended) Ginkgo framework while maintaining dual-compatibility with traditional Go tests.

This PR continues the systematic migration of e2e tests to support OTE test discovery, following the pattern established in previous PRs (#297-#305).

Changes

test/e2e/e2e.go (+127 lines)

  • Add Ginkgo test context for OTE test discovery:
  • refresh-CA - Tests CA regeneration when deleted and recreated
  • Extract test logic into shared function with testing.TB interface:
  • testRefreshCA() - Verifies serving certs and configmaps update when CA secret is deleted and recreated
  • Add helper functions:
  • pollForCABundleInjectionConfigMapWithReturn() - Polls for CA bundle injection configmap
  • pollForCARecreation() - Polls for CA secret recreation after deletion

test/e2e/e2e_test.go (-102 lines)

  • Replace inline test code (74 lines) with test runner calling shared function
  • Remove duplicate helper functions:
  • pollForCABundleInjectionConfigMapWithReturn()
  • pollForCARecreation()

Net change: +30 lines (132 added - 102 removed)

The net +30 lines consist of:

  1. Ginkgo Context wrapper (~6 lines) - OTE test discovery
  2. Comments and documentation (~7 lines) - Better documentation
  3. Two helper functions (~17 lines) - Now reusable from both files

The migration moved 95 lines from e2e_test.go → e2e.go and added 30 lines of new infrastructure (Ginkgo wrapper + better organization).

Test Verification

Traditional Go Test

$ go test -v -run "^TestE2E$/^refresh-CA$" ./test/e2e/ -timeout 15m
=== RUN   TestE2E
=== RUN   TestE2E/refresh-CA
--- PASS: TestE2E (76.51s)
   --- PASS: TestE2E/refresh-CA (74.79s)
PASS
ok      github.com/openshift/service-ca-operator/test/e2e      76.549s

OTE Binary

$ go build -o /tmp/service-ca-operator-tests-ext ./test/e2e/
# Binary compiles successfully (ready for OTE test discovery)

Dual Compatibility

Both test frameworks continue to work:

  • Standard Go test: go test -run "^TestE2E$/^refresh-CA$"
  • OTE: Ginkgo test discovery via service-ca-operator-tests-ext

Related PRs

Part of the OTE migration effort:

Next Steps

After this PR, remaining tests to migrate:

  • time-based-ca-rotation
  • forced-ca-rotation
  • apiservice-ca-bundle-injection
  • crd-ca-bundle-injection
  • mutatingwebhook-ca-bundle-injection
  • validatingwebhook-ca-bundle-injection

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@wangke19 wangke19 changed the title CNTRLPLANE-2549:test/e2e: migrate refresh-CA test for OTE compatibility [WIP]CNTRLPLANE-2549:test/e2e: migrate refresh-CA test for OTE compatibility Jan 16, 2026
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 16, 2026
@wangke19
Copy link
Contributor Author

/test e2e-aws-operator-serial-ote

2 similar comments
@wangke19
Copy link
Contributor Author

/test e2e-aws-operator-serial-ote

@wangke19
Copy link
Contributor Author

/test e2e-aws-operator-serial-ote

Move headless-stateful-serving-cert-secret-delete-data context to correct position.
It should run after serving-cert-secret-delete-data and before ca-bundle tests.

This fixes the test execution sequence to match e2e_test.go order:
1. serving-cert-secret-delete-data
2. headless-stateful-serving-cert-secret-delete-data
3. ca-bundle-injection-configmap
4. ca-bundle-injection-configmap-update
5. vulnerable-legacy-ca-bundle-injection-configmap
6. metrics
7. refresh-CA
Add g.Ordered decorator to Ginkgo Describe block to guarantee tests
run in declaration order.

Without g.Ordered, Ginkgo randomizes test execution order by default,
even with Parallelism: 1. This causes OTE tests to fail because the
tests have state dependencies and must run in the exact order defined
in e2e_test.go.

The g.Ordered decorator ensures tests execute in this sequence:
1. serving-cert-annotation
2. serving-cert-secret-modify-bad-tlsCert
3. serving-cert-secret-add-data
4. serving-cert-secret-delete-data
5. headless-stateful-serving-cert-secret-delete-data
6. ca-bundle-injection-configmap
7. ca-bundle-injection-configmap-update
8. vulnerable-legacy-ca-bundle-injection-configmap
9. metrics
10. refresh-CA

This matches the execution order in traditional Go tests.
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jan 20, 2026

@wangke19: This pull request references CNTRLPLANE-2549 which is a valid jira issue.

Details

In response to this:

Summary

Migrates the refresh-CA test to the OTE (OpenShift Test Extended) Ginkgo framework while maintaining dual-compatibility with traditional Go tests.

This PR continues the systematic migration of e2e tests to support OTE test discovery, following the pattern established in previous PRs (#297-#305).

Critical Fix: Added g.Ordered for Test Execution Order

Problem: OTE's BuildExtensionTestSpecsFromOpenShiftGinkgoSuite() does not preserve g.Ordered or g.Serial decorators. Without g.Ordered, Ginkgo randomizes test execution order by default, even with Parallelism: 1 in the suite configuration.

Impact: Tests with state dependencies fail in OTE CI because they run in random order instead of the intended sequential order.

Solution: Added g.Ordered decorator to the Describe block to guarantee tests execute in declaration order, matching the execution sequence in traditional Go tests.

Changes

test/e2e/e2e.go

  • Add g.Ordered decorator to enforce sequential test execution
  • Fix test context order: Move headless-stateful-serving-cert-secret-delete-data to correct position (set .travis.yml golang to 1.9 #5, after serving-cert-secret-delete-data)
  • Add Ginkgo test context for OTE test discovery:
  • refresh-CA - Tests CA regeneration when deleted and recreated
  • Extract test logic into shared function with testing.TB interface:
  • testRefreshCA() - Verifies serving certs and configmaps update when CA secret is deleted and recreated
  • Add helper functions:
  • pollForCABundleInjectionConfigMapWithReturn() - Polls for CA bundle injection configmap
  • pollForCARecreation() - Polls for CA secret recreation after deletion

test/e2e/e2e_test.go

  • Replace inline test code (74 lines) with test runner calling shared function
  • Remove duplicate helper functions:
  • pollForCABundleInjectionConfigMapWithReturn()
  • pollForCARecreation()
  • Remove unused imports

Test Execution Order (Guaranteed by g.Ordered)

1. serving-cert-annotation
2. serving-cert-secret-modify-bad-tlsCert
3. serving-cert-secret-add-data
4. serving-cert-secret-delete-data
5. headless-stateful-serving-cert-secret-delete-data  ✅ Fixed position
6. ca-bundle-injection-configmap
7. ca-bundle-injection-configmap-update
8. vulnerable-legacy-ca-bundle-injection-configmap
9. metrics
10. refresh-CA  ✅ This test

This order matches the execution sequence in e2e_test.go and ensures state dependencies are preserved.

Testing

Both test frameworks work correctly:

  • ✅ Standard Go test: go test -run "^TestE2E$/^refresh-CA$"
  • ✅ OTE: service-ca-operator-tests-ext run-test "refresh-CA"

Technical Details

Why g.Ordered is Required:

  • Ginkgo v2 randomizes test order by default (to detect test interdependencies)
  • Parallelism: 1 only means "run one test at a time", NOT "run in declaration order"
  • OTE's spec transformation doesn't preserve ordering semantics
  • Our tests have intentional state dependencies (CA creation, rotation, deletion)
  • g.Ordered explicitly enforces declaration order for all nested Context/It blocks

Related

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

The CI job e2e-aws-operator-serial-ote fails because the refresh-CA
test deletes the cluster service CA signing key, causing cluster-wide
TLS disruption that OTE monitors detect as failures. Split the single
suite into a Stable suite (strict monitoring) and a Disruptive suite
(relaxed thresholds) so monitors adjust expectations appropriately.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-ci openshift-ci bot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 11, 2026
@wangke19
Copy link
Contributor Author

Associataed PR openshift/release#74807

@wangke19 wangke19 changed the title [WIP]CNTRLPLANE-2549:test/e2e: migrate refresh-CA test for OTE compatibility CNTRLPLANE-2549:test/e2e: migrate refresh-CA test for OTE compatibility Feb 13, 2026
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 13, 2026
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 13, 2026

@wangke19: This pull request references CNTRLPLANE-2549 which is a valid jira issue.

Details

In response to this:

Summary

Split the OTE suite into stable and disruptive suites to fix CI failures caused by the refresh-CA test's cluster-wide TLS disruption.

Problem

The CI job e2e-aws-operator-serial-ote fails because the refresh-CA test deletes the cluster's service CA signing key (signing-key secret in openshift-service-ca), causing cluster-wide TLS disruption. The OTE framework runs openshift-tests monitors alongside the tests that detect this disruption and report failures. The current suite does not set ClusterStability, which defaults to Stable (zero disruption expected).

Solution

Split into two suites:

  • Stable suite (openshift/service-ca-operator/operator/serial): 13 non-disruptive tests with strict cluster health monitoring (ClusterStability: Stable)
  • Disruptive suite (openshift/service-ca-operator/operator/serial-disruptive): 1 test (refresh-CA) with relaxed monitor thresholds (ClusterStability: Disruptive)

Changes

1. Tag refresh-CA test as [Disruptive] in test/e2e/e2e.go

// Before:
g.It("[Operator][Serial] should regenerate serving certs and configmaps when CA is deleted and recreated", func() {

// After:
g.It("[Operator][Serial][Disruptive] should regenerate serving certs and configmaps when CA is deleted and recreated", func() {

2. Split into two suites in cmd/service-ca-operator-tests-ext/main.go

Replace the single suite with two:

// Non-disruptive tests run with strict cluster health monitoring.
extension.AddSuite(oteextension.Suite{
   Name:             "openshift/service-ca-operator/operator/serial",
   Parallelism:      1,
   ClusterStability: oteextension.ClusterStabilityStable,
   Qualifiers: []string{
       `name.contains("[Operator]") && name.contains("[Serial]") && !name.contains("[Disruptive]")`,
   },
})

// Disruptive tests (e.g. CA rotation) that cause expected cluster-wide TLS disruption.
// Monitors will relax thresholds for this suite.
extension.AddSuite(oteextension.Suite{
   Name:             "openshift/service-ca-operator/operator/serial-disruptive",
   Parallelism:      1,
   ClusterStability: oteextension.ClusterStabilityDisruptive,
   Qualifiers: []string{
       `name.contains("[Operator]") && name.contains("[Serial]") && name.contains("[Disruptive]")`,
   },
})

Files Modified

File Change
cmd/service-ca-operator-tests-ext/main.go Split single suite into stable + disruptive suites
test/e2e/e2e.go Add [Disruptive] tag to refresh-CA test name

Verification Results

Build

  • make build - Both service-ca-operator and service-ca-operator-tests-ext binaries compiled successfully

OTE Suite Verification

  • Suite registration - list suites confirmed two suites with correct ClusterStability settings
  • Test routing - list tests confirmed 14 total tests; refresh-CA tagged [Disruptive] routed to disruptive suite

Suite breakdown:

  • Stable suite (openshift/service-ca-operator/operator/serial): 13 non-disruptive tests, ClusterStability: Stable
  • Disruptive suite (openshift/service-ca-operator/operator/serial-disruptive): 1 test (refresh-CA), ClusterStability: Disruptive

Test Execution

Mode Tests Found Result Failure Reason
go test ./test/e2e/... 1 test function (TestE2E) Cluster unavailable (Service Unavailable)
OTE stable suite 13 tests (non-disruptive) Cluster unavailable (Service Unavailable)
OTE disruptive suite 1 test (refresh-CA) Cluster unavailable (Service Unavailable)

Note: All test execution failures are due to the target cluster (api.ci-ln-i0dvkrb-76ef8.aws-4.ci.openshift.org:6443) being unreachable (Service Unavailable). This is an infrastructure issue, not a code issue. The suite split and test routing are verified correct.

Unit Tests

  • make test-unit - Pre-existing timeout in TestWebhookCABundleInjectorSync (cabundleinjector package) — unrelated to this change

Related Work

CI Configuration

A companion PR has been created in openshift/release to add the new CI job for the disruptive suite:

Future Work

Future CA rotation test migrations (time-based-ca-rotation, forced-ca-rotation) should also use the [Disruptive] tag so they are automatically routed to the disruptive suite.

Commit

  • Branch: ote-migrate-refresh-ca
  • Commit: e60fe227d
  • Message: test/e2e: split OTE suite into stable and disruptive for refresh-CA
  • Status: Pushed to origin/ote-migrate-refresh-ca

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 13, 2026

@wangke19: This pull request references CNTRLPLANE-2549 which is a valid jira issue.

Details

In response to this:

Summary

Split the OTE suite into stable and disruptive suites to fix CI failures caused by the refresh-CA test's cluster-wide TLS disruption.

Problem

The CI job e2e-aws-operator-serial-ote fails because the refresh-CA test deletes the cluster's service CA signing key (signing-key secret in openshift-service-ca), causing cluster-wide TLS disruption. The OTE framework runs openshift-tests monitors alongside the tests that detect this disruption and report failures. The current suite does not set ClusterStability, which defaults to Stable (zero disruption expected).

Solution

Split into two suites:

  • Stable suite (openshift/service-ca-operator/operator/serial): 13 non-disruptive tests with strict cluster health monitoring (ClusterStability: Stable)
  • Disruptive suite (openshift/service-ca-operator/operator/serial-disruptive): 1 test (refresh-CA) with relaxed monitor thresholds (ClusterStability: Disruptive)

Changes

1. Tag refresh-CA test as [Disruptive] in test/e2e/e2e.go

// Before:
g.It("[Operator][Serial] should regenerate serving certs and configmaps when CA is deleted and recreated", func() {

// After:
g.It("[Operator][Serial][Disruptive] should regenerate serving certs and configmaps when CA is deleted and recreated", func() {

2. Split into two suites in cmd/service-ca-operator-tests-ext/main.go

Replace the single suite with two:

// Non-disruptive tests run with strict cluster health monitoring.
extension.AddSuite(oteextension.Suite{
   Name:             "openshift/service-ca-operator/operator/serial",
   Parallelism:      1,
   ClusterStability: oteextension.ClusterStabilityStable,
   Qualifiers: []string{
       `name.contains("[Operator]") && name.contains("[Serial]") && !name.contains("[Disruptive]")`,
   },
})

// Disruptive tests (e.g. CA rotation) that cause expected cluster-wide TLS disruption.
// Monitors will relax thresholds for this suite.
extension.AddSuite(oteextension.Suite{
   Name:             "openshift/service-ca-operator/operator/serial-disruptive",
   Parallelism:      1,
   ClusterStability: oteextension.ClusterStabilityDisruptive,
   Qualifiers: []string{
       `name.contains("[Operator]") && name.contains("[Serial]") && name.contains("[Disruptive]")`,
   },
})

Files Modified

File Change
cmd/service-ca-operator-tests-ext/main.go Split single suite into stable + disruptive suites
test/e2e/e2e.go Add [Disruptive] tag to refresh-CA test name

Verification Results

Build

  • make build - Both service-ca-operator and service-ca-operator-tests-ext binaries compiled successfully

OTE Suite Verification

  • Suite registration - list suites confirmed two suites with correct ClusterStability settings
  • Test routing - list tests confirmed 14 total tests; refresh-CA tagged [Disruptive] routed to disruptive suite

Suite breakdown:

  • Stable suite (openshift/service-ca-operator/operator/serial): 13 non-disruptive tests, ClusterStability: Stable
  • Disruptive suite (openshift/service-ca-operator/operator/serial-disruptive): 1 test (refresh-CA), ClusterStability: Disruptive

Test Execution

Mode Tests Found Result Failure Reason
go test ./test/e2e/... 1 test function (TestE2E) Cluster unavailable (Service Unavailable)
OTE stable suite 13 tests (non-disruptive) Cluster unavailable (Service Unavailable)
OTE disruptive suite 1 test (refresh-CA) Cluster unavailable (Service Unavailable)

Note: All test execution failures are due to the target cluster being unreachable (Service Unavailable). This is an infrastructure issue, not a code issue. The suite split and test routing are verified correct.

Unit Tests

  • make test-unit - Pre-existing timeout in TestWebhookCABundleInjectorSync (cabundleinjector package) — unrelated to this change

Related Work

CI Configuration

A companion PR has been created in openshift/release to add the new CI job for the disruptive suite:

Future Work

Future CA rotation test migrations (time-based-ca-rotation, forced-ca-rotation) should also use the [Disruptive] tag so they are automatically routed to the disruptive suite.

Commit

  • Branch: ote-migrate-refresh-ca
  • Commit: e60fe227d
  • Message: test/e2e: split OTE suite into stable and disruptive for refresh-CA
  • Status: Pushed to origin/ote-migrate-refresh-ca

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 13, 2026

@wangke19: This pull request references CNTRLPLANE-2549 which is a valid jira issue.

Details

In response to this:

Summary

Split the OTE suite into stable and disruptive suites to fix CI failures caused by the refresh-CA test's cluster-wide TLS disruption.

Problem

The CI job e2e-aws-operator-serial-ote fails because the refresh-CA test deletes the cluster's service CA signing key (signing-key secret in openshift-service-ca), causing cluster-wide TLS disruption. The OTE framework runs openshift-tests monitors alongside the tests that detect this disruption and report failures. The current suite does not set ClusterStability, which defaults to Stable (zero disruption expected).

Solution

Split into two suites:

  • Stable suite (openshift/service-ca-operator/operator/serial): 13 non-disruptive tests with strict cluster health monitoring (ClusterStability: Stable)
  • Disruptive suite (openshift/service-ca-operator/operator/serial-disruptive): 1 test (refresh-CA) with relaxed monitor thresholds (ClusterStability: Disruptive)

Changes

1. Tag refresh-CA test as [Disruptive] in test/e2e/e2e.go

// Before:
g.It("[Operator][Serial] should regenerate serving certs and configmaps when CA is deleted and recreated", func() {

// After:
g.It("[Operator][Serial][Disruptive] should regenerate serving certs and configmaps when CA is deleted and recreated", func() {

2. Split into two suites in cmd/service-ca-operator-tests-ext/main.go

Replace the single suite with two:

// Non-disruptive tests run with strict cluster health monitoring.
extension.AddSuite(oteextension.Suite{
   Name:             "openshift/service-ca-operator/operator/serial",
   Parallelism:      1,
   ClusterStability: oteextension.ClusterStabilityStable,
   Qualifiers: []string{
       `name.contains("[Operator]") && name.contains("[Serial]") && !name.contains("[Disruptive]")`,
   },
})

// Disruptive tests (e.g. CA rotation) that cause expected cluster-wide TLS disruption.
// Monitors will relax thresholds for this suite.
extension.AddSuite(oteextension.Suite{
   Name:             "openshift/service-ca-operator/operator/serial-disruptive",
   Parallelism:      1,
   ClusterStability: oteextension.ClusterStabilityDisruptive,
   Qualifiers: []string{
       `name.contains("[Operator]") && name.contains("[Serial]") && name.contains("[Disruptive]")`,
   },
})

Files Modified

File Change
cmd/service-ca-operator-tests-ext/main.go Split single suite into stable + disruptive suites
test/e2e/e2e.go Add [Disruptive] tag to refresh-CA test name

Verification Results

Build

  • make build - Both service-ca-operator and service-ca-operator-tests-ext binaries compiled successfully

OTE Suite Verification

  • Suite registration - list suites confirmed two suites with correct ClusterStability settings
  • Test routing - list tests confirmed 14 total tests; refresh-CA tagged [Disruptive] routed to disruptive suite

Suite breakdown:

  • Stable suite (openshift/service-ca-operator/operator/serial): 13 non-disruptive tests, ClusterStability: Stable
  • Disruptive suite (openshift/service-ca-operator/operator/serial-disruptive): 1 test (refresh-CA), ClusterStability: Disruptive

Test Execution

Mode Tests Found Result Failure Reason
OTE stable suite 13 tests (non-disruptive) Cluster unavailable (Service Unavailable)
OTE disruptive suite 1 test (refresh-CA) Cluster unavailable (Service Unavailable)

Note: All test execution failures are due to the target cluster being unreachable (Service Unavailable). This is an infrastructure issue, not a code issue. The suite split and test routing are verified correct.

Related Work

CI Configuration

A companion PR has been created in openshift/release to add the new CI job for the disruptive suite:

Future Work

Future CA rotation test migrations (time-based-ca-rotation, forced-ca-rotation) should also use the [Disruptive] tag so they are automatically routed to the disruptive suite.

Commit

  • Branch: ote-migrate-refresh-ca
  • Commit: e60fe227d
  • Message: test/e2e: split OTE suite into stable and disruptive for refresh-CA
  • Status: Pushed to origin/ote-migrate-refresh-ca

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor Author

@wangke19 wangke19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! You're correct that g.Ordered is overridden in CI by OTE's suite configuration. However, I'd like to keep it for local development workflows:

  1. Local go test runs: When developers run go test ./test/e2e/... locally (without OTE), g.Ordered ensures tests execute in the correct order
  2. Explicit documentation: It makes the test dependencies clear to anyone reading the code
  3. No harm in CI: Since OTE overrides it, keeping it doesn't cause any issues in CI

The tests have intentional state dependencies (CA creation, rotation, deletion), so having g.Ordered as a safety net for local testing seems valuable. What do you think?

@gangwgr
Copy link

gangwgr commented Feb 13, 2026

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 13, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 13, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gangwgr, wangke19
Once this PR has been reviewed and has the lgtm label, please assign bertinatto for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

extension.AddSuite(oteextension.Suite{
Name: "openshift/service-ca-operator/operator/serial-disruptive",
Parallelism: 1,
ClusterStability: oteextension.ClusterStabilityDisruptive,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting. Do we know which monitor test failed exactly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parallelism: 1,
Name: "openshift/service-ca-operator/operator/serial",
Parallelism: 1,
ClusterStability: oteextension.ClusterStabilityStable,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the default ClusterStability? Could we use it ?

Copy link
Contributor Author

@wangke19 wangke19 Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.
ClusterStabilityStable is defined in the vendored openshift-tests-extension package at pkg/extension/types.go line 54:

ClusterStabilityStable ClusterStability = "Stable"
Yes, the default ClusterStability is Stable — when it's empty/unset, openshift-tests defaults to Stable (ref). So we can remove the explicit ClusterStabilityStable for the non-disruptive suite and only set ClusterStabilityDisruptive for the disruptive one. Will update.

@p0lyn0mial
Copy link
Contributor

Local go test runs: When developers run go test ./test/e2e/... locally (without OTE), g.Ordered ensures tests execute in the correct order

@wangke19 will go test ./test/e2e/... even run a ginkgo spec/test ?

@wangke19
Copy link
Contributor Author

will go test ./test/e2e/... even run a ginkgo spec/test ?

You're right — it won't. There's no Ginkgo bootstrap (RunSpecs/RegisterFailHandler) in the test package, so go test only runs the standard TestE2E function, not the Ginkgo specs in e2e.go. The Ginkgo specs are only executed via the OTE binary (service-ca-operator-tests-ext).

Since g.Ordered is overridden in CI by OTE, and go test doesn't run the Ginkgo specs at all, g.Ordered has no effect in either path. I'll remove it.

- Remove g.Ordered decorator since OTE overrides it in CI and
  go test does not run Ginkgo specs
- Remove explicit ClusterStabilityStable as it is the default
  when ClusterStability is unset
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Feb 13, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 13, 2026

New changes are detected. LGTM label has been removed.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 13, 2026

@wangke19: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments